Paying attention to speaking rate

نویسندگان

  • Alexander L. Francis
  • Howard C. Nusbaum
چکیده

Variability in speaking rate results in a many-to-many mapping between acoustic properties in speech and the linguistic interpretation of an utterance. In order to recognize the phonetic structure of an utterance, listeners must calibrate their phonetic decisions against the rate at which the speech was produced. This process of rate normalization is fast and effective allowing listeners to maintain phonetic constancy in spite of changes in speaking rate. Most of the research on rate normalization has investigated the sources of information used by listeners to determine the speaking rate. There is an assumption in much of this research that the normalization process is a passive, automatized filtering process that strips the effects of rate variation away from the signal prior to recognition. The present study starts from a different perspective by assuming that speech perception is carried out by an active perceptual process that is specifically needed to address the lack of invariance problem (Nusbaum & Henly, in press). This perspective predicts that increased variability from any source, including rate variability, should increase the cognitive load during speech perception. Our results support this prediction. 1. SPEAKING RATE VARIABILITY Listeners are constantly exposed to a wide range of speaking rates. Talkers differ in their characteristic speaking rates. Individual talkers will also vary their speaking rate, sometimes even within a single utterance. This variation affects the acoustic patterns of speech by restructuring the relationship between acoustic cues and phonetic categories (see Miller, 1981). The durations of cues change nonlinearly with speaking rate and there may also be changes in spectral patterns. This results in a lack of invariance in mapping acoustic cues onto phonetic categories (cf. Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). At one speaking rate, the distribution of acoustic properties that denote different phonetic categories may overlap substantially; across different speaking rates this overlap becomes much more extreme (e.g., Miller & Baer, 1983). Transition durations that correspond to /w/ at a fast speaking rate might correspond to a /b/ at a slow speaking rate. This means that in order to interpret a particular acoustic property as the intended phonetic segment, listeners must know something about the speaking rate at which the utterance was produced. Listeners must calibrate their phonetic decisions against the context of the speaking rate, or in other words, they must normalize rate differences. There is evidence that listeners derive information about speaking rate from the context of the utterance in which a particular cue is judged. The source of this may be from the overall structure of the sentence (e.g., Gordon, 1988; Miller & Grosjean, 1981) or from the structure of the syllable carrying the target property (e.g., Miller & Liberman, 1979; Newman & Sawusch, 1996; Port & Dalby, 1982). However, although research has investigated some of the sources of information listeners use in rate normalization, there are few explicit models of rate normalization. Miller and Liberman (1979) emphasized that the perceiver was affected by the articulatory rate rather than the physical rate represented by the acoustic durations. If rate normalization is based on the underlying articulatory characteristics of speech rather than on the acoustic properties of the signal, this suggests that listeners might normalize speaking rate differences by using an approach based on motor theory (cf. Liberman & Mattingly, 1985). This would emphasize the role of articulatory knowledge in resolving the lack of invariance problem resulting from variation in speaking rate. By contrast, Pisoni, Carrell, and Gans (1983) questioned whether this rate normalization effect was based on articulatory knowledge. Using nonspeech analogs of Miller and Liberman’s (1979) speech stimuli, they found the same pattern of results as those reported by Miller and Liberman. However since the nonspeech analogs were not heard as speech, these stimuli must have been immune from the operation of articulatory knowledge. In addition, Gordon (1988) demonstrated that a time-varying sinusoid matched to the F0 of a context sentence produced changes in classification of speech stimuli consistent with rate normalization, even though this sinusoid was not perceived as speech. These kinds of experiments suggest that rate normalization is a function of more general auditory mechanisms rather than specific to articulatory knowledge. Regardless of whether the claim is that listeners use articulatory knowledge or general auditory mechanisms, it is interesting to note that the difference in theoretical perspective is based on the type of information that is being processed. This concern with the type of information or knowledge used to resolve the lack of invariance in mapping acoustic cues to phonetic categories has typically framed much of the theoretical debate in speech research. The contrast between articulatory theories and auditory theories of speech perception is well known. Furthermore, theories of talker normalization have also generally focused on this issue of determining the the sources of information and the nature of knowledge needed to overcome the effects of talker variability. 2. TALKER VARIABILITY There is much in common between the problems of rate variability and talker variability. Differences among talkers result in overlapping distributions of acoustic cues such that any particular acoustic pattern may correspond to more than one phonetic category and one phonetic category may be cued by several different acoustic patterns (e.g., Peterson & Barney, 1952). As a result, in order to interpret an acoustic pattern as the intended phonetic segment, it is necessary for a listener to know something about the vocal characteristics of the talker. Just as the listener appears to derive information about the speaking rate from the overall context of the utterance or the intrinsic structure of the syllable that is being recognized, listeners also appear to use extrinsic context and intrinsic structure to derive information about the vocal characteristics of the talker (Ainsworth, 1975; Neary, 1989). Despite the apparent similarities in these problems, there has been little in common in the theoretical perspectives taken on rate normalization and talker normalization. Theories of talker normalization have focused specifically on the problem of vocal tract scaling (cf. Fant, 1973). For example, models of talker normalization have been proposed using extrinsic information about point vowels derived from prior context (Gerstman, 1968) or using the intrinsic structure of the vowel including F0 and F3 (Syrdal & Gopal, 1986) to scale F1 and F2 for talker-independent interpretation. Since these theories are concerned with scaling spectral patterns in the context of vocal tract differences, these mechanisms seem irrelevant to explaining the process of temporal scaling in rate normalization. However, Nusbaum and Magnuson (in press) have argued that there is an important, equivalent computation-theoretic structure to all of the manifestations of the lack of invariance in acoustic-phonetic mapping. They distinguish, on computational theoretic grounds, the case in which one linguistic interpretation has multiple alternative acoustic instantiations and the case in which one acoustic cue maps onto multiple linguistic interpretations. The first of these can be processed by any simple deterministic finite state automaton and therefore nearly any kind of simple computational device and poses no real theoretic problem. The second case is different since it represents a basic nondeterministic relationship between patterns and the classification of those patterns which requires a different kind of computational solution. Nusbaum and Magnuson (in press) argued that this kind of computational problem cannot be resolved by changing the nature of the knowledge used in processing but instead requires a different computational control structure--an actively controlled computational mechanism (see Nusbaum & Schwab, 1986). One hallmark of an active computational mechanism is that the listener’s cognitive load should be affected by the computational demands on an active mechanism (Nusbaum & Schwab, 1986). Several studies have demonstrated that talker variability increases recognition time (Mullennix & Pisoni, 1990; Nusbaum & Morin, 1992; Summerfield & Haggard, 1975) and that this increased response time is due to increased cognitive load when there is talker variability (Mullennix & Pisoni, 1990; Nusbaum & Morin, 1992). These results suggest that listeners may use an actively controlled process to recognize speech when there is talker variability (Nusbaum & Magnuson, in press). Nusbaum and Henly (in press) have suggested that this may reflect the operation of a more general set of cognitive principles that govern perception whenever there is lack of invariance in acoustic-phonetic relationships. If this more general claim is valid, and speaking rate variability represents the same kind of lack of invariance problem, in computational terms of a nondeterministic mapping between cues and categories, we should find that rate normalization also increases the cognitive load on the listener. 3. ATTENTION AND RATE NORMALIZATION Although the attentional demands of rate normalization have not been investigated explicitly, there is some evidence suggesting that rate normalization may be a passive filtering process rather than an active attention-demanding process. Miller, Green, and Schermer (1984) argued that rate normalization is obligatory and not under active attentional control since listeners could not strategically avoid using rate information from a context sentence in phonetic classification, even though they could ignore semantic information. Similarly, Miller and Dexter (1988) found that listeners could not avoid using rate information in classifying segments, but they could ignore lexical status under certain task constraints. When listeners respond quickly, they still carry out rate normalization during phonetic classification even when they do not use other sources of higher-order linguistic knowledge. However, one finding in the Miller and Dexter (1988) is at odds with the idea of a passive, automatized rate normalization mechanism: Listeners used different sources of rate information under different task constraints. This kind of flexibility in shifting attention to different sources of information is more consistent with an active perceptual mechanism (see Nusbaum & Schwab, 1986). Although these studies are suggestive, they were not designed test specifically the question of whether rate normalization is a passive, automatized process or an active, controlled process. A more direct test of this question would assess whether rate variability increases the listener’s cognitive load in the same way as has been observed with talker variability (Nusbaum &

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perception of Mandarin Tones by Chinese- and English-speaking Listeners

This paper reports on two experiments that tested the hypothesis that native phonology may influence speech perception. Both experiments used natural speech tokens of Standard Mandarin tones and Chineseand American Englishspeaking listeners. The results from both the AX discrimination and the degree of difference rating experiments show language-specific effects: the Chinese-speaking listeners’...

متن کامل

More attention when speaking: does it help or does it hurt?

Paying selective attention to a word in a multi-word utterance results in a decreased probability of error on that word (benefit), but an increased probability of error on the other words (cost). We ask whether excitation of the prefrontal cortex helps or hurts this cost. One hypothesis (the resource hypothesis) predicts a decrease in the cost due to the deployment of more attentional resources...

متن کامل

Title : More attention when speaking : does it help or does it hurt ?

Paying selective attention to a word in a multi-word utterance results in a decreased probability of error on that word (benefit), but an increased probability of error on the other words (cost). We ask whether excitation of the prefrontal cortex helps or hurts this cost. One hypothesis (the resource hypothesis) predicts a decrease in the cost due to the deployment of more attentional resources...

متن کامل

Sociolinguistic competence in the complimenting act of native Chinese and American English speakers: a mirror of cultural value.

The present study examines sociolinguistic features of a particular speech act, paying compliments, by comparing and contrasting native Chinese and native American speakers' performances. By focusing on a relatively understudied speaker group such as the Chinese, typically regarded as having rules of speaking and social norms very different from those of Westerners, this paper aims at illuminat...

متن کامل

The Effect of Instructing Speaking Strategies Used by Successful EFL Learners on Unsuccessful Learners’ Speaking Improvement in Iran

Over the recent years, the study of language learning strategies has received much attention worldwide in general, and in Iran in particular. Many scholars have tried to investigate the function of language learning strategies in EFL learning and teaching. Not enough attention, however, has been paid to language skills, especially speaking skill, in Iran. Therefore, the present study aimed at s...

متن کامل

The Necessity of Paying Attention to the Developmental and Educational Significance of Pre-School Years, as Indicated by Contemporary Research

The Necessity of Paying Attention to the Developmental and Educational Significance of Pre-School Years, as Indicated by Contemporary Research E. Talaa'ee, Ph.D. H. Bozorg To demonstrate the necessity of paying attention to the developmental and educational significance of the early years of childhood (from birth to first grade) by all involved, a subset of all related researc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996